Prototype(symbolization): Add symbolization in Pyroscope read path #3799

marcsanmi · 2024-12-20T11:10:06Z

Context

This PR introduces a comprehensive implementation for DWARF symbolization of unsymbolized profiles in the Pyroscope read path. It enables automatic symbolization of profiles for non-customer code (primarily open-source libraries and binaries) where symbol information isn't available at collection time.

Symbolization

DWARF parsing: Optimized parsing of debug information with minimal memory overhead
Comprehensive symbol resolution: Support for function names, file paths, and line numbers
Inline function resolution: Proper handling of inlined functions for accurate stack traces
Address-based lookup: Fast address-to-symbol mapping with optimized data structures

Multi-level Caching

In-memory symbol cache: LRU cache for frequently accessed symbols
Object storage for debug files: Persistent storage of debug files with configurable obj storage solution
Configurable TTL: Control over cache expiration for both memory and storage caches

Integration Points

Read path symbolization: Automatically symbolize profiles during query time
Remote debug info fetching: Integration with debuginfod for symbol discovery from public servers

Configuration Example

symbolizer:
  enabled: true
  debuginfod_url: "https://debuginfod.elfutils.org"
  in_memory_symbol_cache_size: 100000         # Symbol cache in memory (entries)
  in_memory_debuginfo_cache_size: 2147483648  # Debug info cache in memory (bytes)
  persistent_debuginfo_store:                 # Debug info in persistent storage
    enabled: true
    max_age: 168h
    storage:                                  # Storage backend configuration
      backend: s3
      s3:
        bucket_name: debug-symbols-bucket
        endpoint: s3.amazonaws.com
        access_key_id: ${S3_ACCESS_KEY}
        secret_access_key: ${S3_SECRET_KEY}

korniltsev

would be nice to have some benchmarks of symbolizing different amount of locations and different file sizes, I think it can help us to pick the right place and architecture for using this

pkg/experiment/symbolization/debuginfod_client.go

pkg/phlaredb/symdb/resolver.go

kolesnikovae

Good work, Marc! I'm excited to see some experimental results 🚀

I think we can implement a slightly more optimized version for production use:

sequenceDiagram
    autonumber

    participant QF as Query Frontend
    participant M  as Metastore
    participant QB as Query Backend
    participant SYM as Symbolizer

    QF ->>+M: Query Metadata
    Note left of M: Build identifiers are returned<br> along with the metadata records
    M ->>-QF: 

    par
        QF ->>+SYM: Request for symbolication
        Note left of SYM: Prepare symbols for<br>the objects requested
    and
        QF ->>+QB: Data retrieval and aggregation
        Note left of QB: The main data path<br>Might be serverless
    end

    QB ->>-QF: Data in pprof format
    Note over QF: Because of the truncation,<br> only a limited set of locations<br>make it here (16K by default) 

    QF --)SYM: Location addresses
    
    SYM ->>-QF: Symbols
    
    QF ->>QF: Flame graph rendering

Even without a parallel pipeline and dedicated symbolication service, we could implement something like this:

sequenceDiagram
    autonumber

    participant QF as Query Frontend
    participant M  as Metastore
    participant QB as Query Backend
    participant SYM as Symbols

    QF ->>+M: Query Metadata
    Note left of M: No build identifiers are returned
    M ->>-QF: 

    QF ->>+QB: Data retrieval and aggregation
    Note left of QB: The main data path<br>Might be serverless

    QB ->>-QF: Data in pprof format
    Note over QF: Because of the truncation,<br> only a limited set of locations<br>make it here (16K by default)

    QF ->>+SYM: Fetch symbols
    SYM ->>-QF: Symbols
    Note over QF: In terms of the added latency,<br>this approach is not worse than<br>block level symbolication
    
    QF ->>QF: Flame graph rendering

I think we should avoid symbolization at the block level if the symbols are not already present in the block itself. Otherwise, this approach leads to excessive processing, increased latency, and higher resource usage. Please keep in mind, that a query may span many thousands of blocks.

I won't delve too deeply into how we fetch and process ELF/DWARF files, but I strongly doubt we can bypass the need for an intermediate representation optimized for our access patterns. Additionally, we need a solution to prevent concurrent access to the debuginfod service.

pkg/phlaredb/symdb/resolver_tree.go

pkg/experiment/query_backend/backend.go

korniltsev · 2025-01-17T10:49:24Z

I have not look into the code yet, but I've tried to run it locally and it looks like it's trying to load a lot of unnecesarry debug files.

I run ebpf profiler with no ontarget symbolization , also run a simple python -m http.server to mock debug infod responses.

I then query only one executable process_cpu:cpu:nanoseconds:cpu:nanoseconds{service_name="unknown", process_executable_path="/home/korniltsev/.cache/JetBrains/IntelliJIdea2024.2/tmp/GoLand/___go_build_go_opentelemetry_io_ebpf_profiler"}

I see 268 GET requests, with 13 requests to "GET /buildid/fbce2598b34f1cf8d0c899f34c2218864e1da6d1/debuginfo HTTP/1.1" 200 - (which is the profiler binary I put into mock server for testing and a bunch of 404 which I assume are build ids for the filles in the other processes which the query does not target.

Other then that it works \M/ Can't wait to run it in dev.

liaol · 2025-02-18T11:39:07Z

Hi @marcsanmi , when can this PR be merged?
Thanks

marcsanmi · 2025-02-19T17:32:42Z

Hi @liaol,
It's still going to take a little while :)

marcsanmi · 2025-03-03T16:17:01Z

I've created this diagram to outline the current Symbolization arch:

flowchart TD
    A[SymbolizePprof] --> B{Group by Mapping}
    B --> C[Symbolize Request]
    
    C --> D{Check Symbol Cache}
    
    subgraph "Symbol Cache Layer (LRU, in-memory)"
        D -->|Cache Hit| E[Return Cached Symbols]
        D -->|Cache Miss| F
    end
    
    F{Check Debug Info Cache} 
    
    subgraph "Debug Info Cache Layer (Ristretto, in-memory)"
        F -->|Cache Hit| G[Read from Debug Info Cache]
        F -->|Cache Miss| H
    end
    
    subgraph "Persistent Storage Layer"
        H{Check Object Store}
        H -->|Cache Hit| I[Read from Object Store]
        H -->|Cache Miss| J[Fetch from Debuginfod]
    end
    
    I --> K[Store in Debug Info Cache]
    J --> L[Store in Debug Info Cache]
    J --> M[Store in Object Store]
    
    G --> N[Parse ELF/DWARF]
    K --> N
    L --> N
    
    subgraph "DWARF Resolution Layer"
        N --> O[Resolve Addresses]
        O --> P{Check Address Map}
        P -->|Map Hit| Q[Return from Map]
        P -->|Map Miss| R[Parse DWARF Data]
        R --> S[Build Lookup Tables]
        S --> T[Store in Address Map]
        T --> U[Return Symbols]
        Q --> U
    end
    
    U --> V[Update Symbol Cache]
    V --> W[Return to Caller]
    E --> W

pkg/test/integration/testdata/otel-ebpf-profiler-offcpu-cpu.json

pkg/experiment/symbolizer/metrics.go

pkg/experiment/symbolizer/debuginfod_client.go

pkg/experiment/symbolizer/cache.go

pkg/experiment/symbolizer/debuginfod_client.go

pkg/experiment/symbolizer/reader.go

kolesnikovae · 2025-03-04T06:21:59Z

I might be missing some details, but I have doubts about the cache hierarchy.

Now it looks like we have: symbols_cache -> object_store -> in_memory_object_store (ristretto) -> debuginfod.

As far as I understand, we're going to read from object_store even if there's just a single unresolved address.

I expected to see: symbols_cache -> in_memory_object_store (ristretto) -> object_store -> debuginfod.

Could you please elaborate on the decision?

cmd/pyroscope/help-all.txt.tmpl

marcsanmi · 2025-03-04T18:54:37Z

I might be missing some details, but I have doubts about the cache hierarchy.

You're right @kolesnikovae. I've just realized the problem is that the ristretto cache is coupled inside the debuginfod client. I'll decoupled it and placed it at symbolizer level. Thus, we'll be able to have the following path:

symbols_cache -> in_memory_object_store (ristretto) -> object_store -> debuginfod

pkg/experiment/ingester/segment.go

pkg/frontend/read_path/query_frontend/query_frontend.go

pkg/test/integration/testdata/otel-ebpf-profiler-offcpu.json.bin

…l review comments

…esentation for lookups in dwarf

…fo, and optimized DWARF parsing

…-optional

* Fix generation of function IDs * Store debuginfo requests 404s in symbol cache * Add fallback symbols for unsymbolizable profiles

…bled

…l review comments

* feat: add symbolizer per tenant overrides

korniltsev reviewed Dec 20, 2024

View reviewed changes

pkg/experiment/symbolization/debuginfod_client.go Outdated Show resolved Hide resolved

marcsanmi force-pushed the marcsanmi/symbolization-poc branch from efdde88 to 6b009d3 Compare January 16, 2025 12:15

marcsanmi requested review from korniltsev and petethepig January 16, 2025 12:20

korniltsev reviewed Jan 17, 2025

View reviewed changes

pkg/phlaredb/symdb/resolver.go Outdated Show resolved Hide resolved

kolesnikovae reviewed Jan 17, 2025

View reviewed changes

marcsanmi changed the title ~~POC feat(symbolization): Add DWARF symbolization with debuginfod support~~ Prototype(symbolization): Add symbolization for unsymbolized profiles in Pyroscope read path Jan 19, 2025

marcsanmi changed the title ~~Prototype(symbolization): Add symbolization for unsymbolized profiles in Pyroscope read path~~ Prototype(symbolization): Add symbolization in Pyroscope read path Jan 19, 2025

marcsanmi force-pushed the marcsanmi/symbolization-poc branch 2 times, most recently from 2937519 to 0b8a289 Compare February 19, 2025 17:29

marcsanmi requested a review from kolesnikovae February 19, 2025 17:31

marcsanmi requested a review from korniltsev February 20, 2025 07:59

marcsanmi force-pushed the marcsanmi/symbolization-poc branch 2 times, most recently from 7c2ab09 to 87a481c Compare March 3, 2025 15:34

kolesnikovae reviewed Mar 4, 2025

View reviewed changes

pkg/test/integration/testdata/otel-ebpf-profiler-offcpu-cpu.json Outdated Show resolved Hide resolved

kolesnikovae reviewed Mar 4, 2025

View reviewed changes

alsoba13 reviewed Mar 4, 2025

View reviewed changes

cmd/pyroscope/help-all.txt.tmpl Outdated Show resolved Hide resolved

marcsanmi force-pushed the marcsanmi/symbolization-poc branch 2 times, most recently from fd39d9e to 65fb599 Compare March 5, 2025 15:49

marcsanmi requested a review from kolesnikovae March 5, 2025 16:52

marcsanmi force-pushed the marcsanmi/symbolization-poc branch from 65fb599 to 10f52ea Compare March 17, 2025 08:52

kolesnikovae reviewed Mar 17, 2025

View reviewed changes

pkg/experiment/ingester/segment.go Outdated Show resolved Hide resolved

liaol reviewed Mar 20, 2025

View reviewed changes

pkg/frontend/read_path/query_frontend/query_frontend.go Outdated Show resolved Hide resolved

kolesnikovae reviewed Mar 20, 2025

View reviewed changes

pkg/test/integration/testdata/otel-ebpf-profiler-offcpu.json.bin Outdated Show resolved Hide resolved

marcsanmi added 29 commits May 8, 2025 08:07

Add Lidia binary layout support

c9abdc1

Remove unnecessary debuginfod client raw files cache & address severa…

e619614

…l review comments

Use bytes.Reader and io.NopCloser

269c5a7

first removal address check

564e19a

feat(symbolization): Add DWARF symbolization POC with debuginfod support

685e39d

fix lint errors

44f28aa

Add symbolization inside the read path

abea8ce

Add cache for debug files

e2f61b0

Adding symbolizer instrumentation

240bf17

chore: move symbolizatoin into query frontend & add intermediate repr…

ef4e67b

…esentation for lookups in dwarf

fix lint and unstaged-changes check

860a4c6

Update symbolization to symbolize from pprof & improve deubginfod client

94e9e61

Add new flags help

d5ee8d6

Add multi-layered caching with LRU for symbols, Ristretto for debugin…

ddbe3c5

…fo, and optimized DWARF parsing

fix another set of tests related to otlp conversion and help options

31ec26d

Address review several comments

20f1341

place symbolizer flags registration under v2 umbrella

466540e

Add __needs_symbolization__ metadata label & made debuginfo store non…

46f453e

…-optional

update metadata labels to work with needs_symbolization & fix maxNodes

255e180

Address review and several fixes:

b21425c

* Fix generation of function IDs * Store debuginfo requests 404s in symbol cache * Add fallback symbols for unsymbolizable profiles

TODO: review this commit, it has logs and otel only smbolization disa…

65e169b

…bled

Change lidia write to accept interface

5ab1299

Use new Lidia format insitead of dwarf

fb3254e

Add Lidia binary layout support

efb2784

Remove unnecessary debuginfod client raw files cache & address severa…

da2d95b

…l review comments

feat: add symbolizer per tenant overrides (#4136)

749b778

* feat: add symbolizer per tenant overrides

Add 404 caching layer

28f82ac

Address several review comments

facb0a4

Add symbolizer enabled tenant flag cli support and clean minor things

e66f976

marcsanmi force-pushed the marcsanmi/symbolization-poc branch from 829f81e to e66f976 Compare May 8, 2025 06:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype(symbolization): Add symbolization in Pyroscope read path #3799

Prototype(symbolization): Add symbolization in Pyroscope read path #3799

marcsanmi commented Dec 20, 2024 •

edited

Loading

korniltsev left a comment

kolesnikovae left a comment

korniltsev commented Jan 17, 2025

liaol commented Feb 18, 2025

marcsanmi commented Feb 19, 2025

marcsanmi commented Mar 3, 2025 •

edited

Loading

kolesnikovae commented Mar 4, 2025 •

edited

Loading

marcsanmi commented Mar 4, 2025

Prototype(symbolization): Add symbolization in Pyroscope read path #3799

Are you sure you want to change the base?

Prototype(symbolization): Add symbolization in Pyroscope read path #3799

Conversation

marcsanmi commented Dec 20, 2024 • edited Loading

Context

Symbolization

Multi-level Caching

Integration Points

Configuration Example

korniltsev left a comment

Choose a reason for hiding this comment

kolesnikovae left a comment

Choose a reason for hiding this comment

korniltsev commented Jan 17, 2025

liaol commented Feb 18, 2025

marcsanmi commented Feb 19, 2025

marcsanmi commented Mar 3, 2025 • edited Loading

kolesnikovae commented Mar 4, 2025 • edited Loading

marcsanmi commented Mar 4, 2025

marcsanmi commented Dec 20, 2024 •

edited

Loading

marcsanmi commented Mar 3, 2025 •

edited

Loading

kolesnikovae commented Mar 4, 2025 •

edited

Loading